APEX -- The APL Parallel Executor
نویسنده
چکیده
APEX: the APL Parallel Executor Robert Bernecky Master of Science Graduate Department of Computer Science University of Toronto 1997 APEX is an APL-to-SISAL compiler, generating high-performance, portable, parallel code that executes up to several hundred times faster than interpreted APL, with serial performance of kernels competitive with FORTRAN. Preliminary results indicate that acceptable multi-processor speedup is achievable. The excellent run-time performance of APEX-generated code arises from attention to all aspects of program execution: run-time syntax analysis is eliminated, setup costs are reduced, algebraic identities and phrase recognition detect special cases, some matrix products exploit a generalization of sparsematrix algebra, and loop fusion and copy optimizations eliminate many array-valued temporaries. In addition, the compiler exploits Static Single Assignment and array morphology, our generalization of data flow analysis to arrays, to generate run-time primitives that use superior algorithms and simpler storage types. Extensions to APL, including rank, cut, and a monadic operand for dyadic reduction, improve compiled and interpreted code performance.
منابع مشابه
An Approach for Proving the Correctness of Inspector/Executor Transformations
To take advantage of multicore parallelism, programmers and compilers rewrite, or transform, programs to expose loop-level parallelism. Showing the correctness, or legality, of such program transformations enables their incorporation into compilers. However, the correctness of inspector/executor strategies, which develop parallel schedules at runtime for computations with nonaffine array access...
متن کاملEeectiveness of Message Strip-mining for Regular and Irregular Communication
Languages such as High Performance Fortran are used to implement parallel algorithms by distributing large data structures across a multicomputer system. To hide communication behind computation, we introduce an optimization scheme, message strip-mining. By using this scheme, the communication overhead is almost completely overlapped with the subsequent computation. We have implemented the prop...
متن کاملParallelization Techniques for Sparse Matrix Applications
Sparse matrix problems are diicult to parallelize eeciently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial run-time preprocessing overheads when references with multiple levels of indirection are encountered | a frequent occurrence in sparse matrix al...
متن کاملAutomatic Parallelizing Compiler for Distributed Memory Parallel Computers: New Algorithms to Improve the Performance of the Inspector/executor
متن کامل
A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies
Graph partitioners play an important role in many parallel work distribution and locality optimization approaches. Surprisingly, however, to our knowledge there is no freely available parallel graph partitioner designed for execution on a shared memory multicore system. This paper presents a shared memory parallel graph partitioner, ParCubed, for use in the context of sparse tiling run-time dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997